Serbian Spa Waters

Group 6

  • Elizabeth
  • Claude
  • Leah
  • Harry

Introduction

  • Analysis of hydrochemical and radiological data of mineral and thermal waters in Serbia.
  • Original paper:
    • PCA: 4 main factors explaining 74.2% of total variance
      • 3 Groups emerged - with 83.3% correct classification.
    • HCA: identified 4 main groups and 8 subgroups

Serbian springs

  • Many mineral, thermal, and thermo-mineral water springs.
  • Some used for drinking water, others as curative spas.
  • Varied composition depends on the aquifer host rocks’ chemical composition, flow conditions, and residence time in the aquifer.
  • Geologically highly complex.

About the dataset

  • 30 observations
  • 1 categorical variable
    • Samples collected from four geological structures:
      1. Hydrogeological basins
      2. Karstic terrains
      3. Volcanogenic massifs
      4. Metamorphic regions

12 numerical variables

Variable Description Units
T Temperature \(^\circ\)C
pH pH level (Acidity/Alkalinity)
EC Electrical conductivity \(\mu\)S/cm
TS Total disolved solids g/L
Ca\(^{2+}\) Calcium mg/L
Mg\(^{2+}\) Magnesium

12 numerical variables

Variable Description Units
Na\(^{+}\) Sodium mg/L
K\(^{+}\) Potassium
Cl\(^{-}\) Chlorine
SO\(^{2-}_4\) Sulfate
HCO\(^{-}_3\) Bicarbonate
SiO\(_2\) Silica, dissolved silicon dioxide

Missing radiological variables

  • Six radiological variables not publicly available.
Variable Description Units
GA Gross alpha activities mBq/L
GB Gross beta activities
\(^{238}\)U Uranium isotope
\(^{228}\)R Radium isotope
\(^{226}\)R
\(^{40}\)K Potassium isotope

Observations per geological structure

Geological Structure Number of Observations
Hydrogeological Basins 5
Karstic Terrains 5
Volcanogenic Massifs 14
Metamorphic Regions 6
  • Unbalanced data set

Correlation Heatmap

  • Certain variables highly correlated:
    • Electrical conductivity & total dissolved solids
    • Potassium (K) & Bicarbonate (HCO)
    • Sodium (Na) & Bicarbonate (HCO)

Boxplots

  • Variables standardized to allow comparison.
  • Data set characterized by high variation, right skew, and outliers.

Checking for multivariate normality

  • MANOVA residual Mahalanobis distances vs. theoretical \(\chi^2\) distribution as a diagnostic.
  • Assumption of multivariate normality not satisfied.
  • Original paper used Box-Cox transformation.
    • Sensitive to outliers.
  • Log transformation needed.

Re-checking for normality

  • Boxplots far more symmetric after log-transformation.
  • Some outliers still present, but fewer.

Re-checking for normality

  • Improved performance of MANOVA residual Mahalanobis distances against the theoretical \(\chi^2\) distribution.
  • Deviations remain.

PCA

PCA

PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
Proportion of Variance 0.4841385 0.1793697 0.1083222 0.0779122 0.0608540 0.0339688 0.0245024 0.0137945
Cumulative Proportion 0.4841385 0.6635082 0.7718303 0.8497425 0.9105965 0.9445654 0.9690677 0.9828622

PCA

PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
log.tempCels -0.023 -0.329 0.521 0.302 -0.653 -0.015 -0.044 0.284
log.pH -0.304 -0.294 -0.187 0.077 0.349 0.314 -0.186 0.707
log.elec.Cond 0.378 -0.197 -0.089 0.009 0.077 -0.275 -0.224 0.153
log.totSolid 0.367 -0.212 -0.172 -0.045 0.041 -0.324 -0.212 0.117
log.Ca2 0.187 0.551 0.083 0.065 -0.02 0.394 0.251 0.33
log.Mg2 0.277 0.438 0.049 -0.14 -0.03 -0.398 -0.02 0.417
log.Na 0.345 -0.301 -0.102 0.102 0.151 0.318 0.073 -0.213
log.K 0.376 0.041 0.14 -0.08 -0.103 0.474 -0.196 -0.113
log.Cl 0.271 -0.076 -0.125 0.676 0.16 -0.111 0.525 0.046
log.SO2 0.053 0.06 0.717 0.163 0.597 -0.068 -0.25 -0.094
log.HCO 0.388 -0.004 -0.118 -0.053 -0.124 0.253 -0.302 0.062
log.SiO 0.174 -0.353 0.27 -0.611 0.11 0.014 0.577 0.163

PC2 vs. PC1 with outliers

Factor Analysis: Factor Adequacy test

Factor Analysis Loadings

Factor Analysis Variance Explained
Factor SS.Loadings Proportion.Var Cumulative.Var
Factor 1 4.173 0.348 0.348
Factor 2 2.843 0.237 0.585
Factor 3 1.378 0.115 0.700
Factor 4 1.120 0.093 0.793
Factor 5 0.379 0.032 0.824
Factor Adequacy Test
Statistic Value
Chi-square 23.25
Degrees of freedom 16
p-value 0.107

Factor Analysis Diagram

Confusion matrix